Comparing Subreddits

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import warnings
Code
%store -r reddit_sent_df
Code
sentiment_by_sub = reddit_sent_df.groupby(['covid_period', 'subreddit'])['compound'].mean().reset_index()
Code
sentiment_by_sub['covid_period'] = pd.Categorical(sentiment_by_sub['covid_period'], 
                                                  categories=['Pre-COVID', 'During COVID', 'Post-COVID'], 
                                                  ordered=True)
Code
sentiment_by_sub = sentiment_by_sub.sort_values('covid_period')

This groups the data by COVID period and subreddit, then calculates the mean compound sentiment score for each combination. The compound score is from VADER sentiment analysis (ranges from -1 most negative to +1 most positive). The code then converts the period to an ordered categorical variable to ensure proper chronological ordering in visualizations.

Code
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"

fig = px.bar(
    sentiment_by_sub,
    x='covid_period',
    y='compound',
    color='subreddit',
    barmode='group',
    title='Average Sentiment Score by Subreddit and COVID Period'
)
fig.show()

Creates an interactive grouped bar chart using Plotly Express. Each subreddit is represented by a different color, and bars are grouped by COVID period on the x-axis.

Sentiment Chart Intepretation

Pre-COVID

  • r/Anxiety: ~-0.14 (least negative)
  • r/depression: ~-0.18
  • r/mentalhealth: ~-0.11 (least negative overall)

Interpretation

All three subreddits show negative sentiment scores even before COVID, which is expected given they’re mental health support communities. r/mentalhealth is slightly less negative, possibly because it’s more focused on general wellness and recovery rather than specific disorders.

During COVID

  • r/Anxiety: ~-0.23 (worsened significantly)
  • r/depression: ~-0.21 (worsened)
  • r/mentalhealth: ~-0.14 (worsened but remained least negative)

Interpretation

All subreddits saw sentiment decline during COVID, with r/Anxiety experiencing the largest drop (from -0.14 to -0.23, a 64% increase in negativity). This aligns with research showing COVID-19 disproportionately impacted people with anxiety disorders due to uncertainty, isolation, and health fears. r/depression also worsened but less dramatically, while r/mentalhealth maintained its position as the least negative space.

Post-COVID

  • r/Anxiety: ~-0.30 (continued worsening!)
  • r/depression: ~-0.28 (continued worsening!)
  • r/mentalhealth: ~-0.16 (countinued worsening!)

Interpretation

This is the most alarming finding: sentiment continued to deteriorate post-COVID rather than recovering. r/Anxiety hit its lowest point at -0.30, representing a 114% increase in negativity from pre-COVID. r/depression also reached its nadir. This suggests the mental health crisis intensified after the acute pandemic phase, possibly due to accumulated trauma, ongoing disruption, or delayed mental health consequences.

Keywords by Subreddit Analysis

Code
stressor_terms = {
    'health_anxiety': ['heart', 'symptoms', 'panic attack', 'panic attacks', 'scared', 'pain', 'health', 'anxious', 'attack'],
    'work_stress': ['job', 'home', 'house', 'wfh', 'remote', 'work'],
    'school_stress': ['school', 'parents', 'mom', 'dad', 'remote school', 'class', 'online class'],
    'burnout': ['tired', 'anymore', 'hate', 'exhausted', 'fucking tired', 'end'],
    'therapy': ['therapist', 'therapy', 'counseling', 'telehealth', 'find help']
}
periods = ['Pre-COVID', 'During COVID', 'Post-COVID']

subreddits = reddit_sent_df['subreddit'].unique()
categories = list(stressor_terms.keys())
results = []

Defines five stressor categories with associated keywords, similar to the previous notebook but focused on comparing how these manifest across different subreddits. This will allow analysis of whether certain communities discuss specific stressors more than others.

Code
def count_total_words(text_series):
    if text_series.empty:
        return 0
    total_words = text_series.astype(str).str.split().str.len().sum()
    return total_words


# This function counts occurrences of a list of keywords
def count_keyword_mentions(text_series, keywords):
    if text_series.empty:
        return 0
    # Create a regex pattern: 'word1|word2|word3'
    pattern = r"\b(" + "|".join(re.escape(k) for k in keywords) + r")\b"
    mentions = text_series.astype(str).str.count(pattern, flags=re.IGNORECASE).sum()
    return int(mentions)

Explanation

count_total_words():

Counts total words in a series of texts by splitting on whitespace

count_keyword_mentions():

Uses regex to count keyword occurrences with word boundaries ( to avoid partial matches. The re.IGNORECASE flag ensures case-insensitive matching.

Code
for period in periods:
    for sub in subreddits:
        # Create the subset of data
        subset_df = reddit_sent_df[
            (reddit_sent_df["covid_period"] == period)
            & (reddit_sent_df["subreddit"] == sub)
        ]

        if subset_df.empty:
            continue

        # Get all text and total words for this subset
        text_data = subset_df["full_text"]
        total_words = count_total_words(text_data)

        if total_words == 0:
            continue

        # Calculate frequency for each category
        for category, keywords in stressor_terms.items():
            mentions = count_keyword_mentions(text_data, keywords)
            # Calculate frequency per 1000 words
            freq_per_1000 = (mentions / total_words) * 1000 if total_words > 0 else 0

            # Store the result
            results.append(
                {
                    "covid_period": period,
                    "subreddit": sub,
                    "category": category,
                    "frequency_per_1000": freq_per_1000,
                    "total_mentions": mentions,
                    "total_words": total_words,
                }
            )

Explanation

This triple-nested loop iterates through each combination of period, subreddit, and stressor category. For each combination, it:

  1. Filters the data to that specific period and subreddit
  2. Counts total words in that subset
  3. Counts mentions of each stressor category’s keywords
  4. Calculates normalized frequency (per 1000 words) for fair comparison
  5. Stores all results in a list, which is then converted to a DataFrame

This creates a comprehensive dataset showing how often each stressor is mentioned in each subreddit during each period.

Code
keyword_freq_df = pd.DataFrame(results)
Code
subreddit_order = sorted(keyword_freq_df['subreddit'].unique())
covid_period_order = ["Pre-COVID", "During COVID", "Post-COVID"]

fig = px.bar(
    keyword_freq_df,
    x='subreddit',
    y='frequency_per_1000',
    color='covid_period',
    facet_col='category',
    facet_col_wrap=3,
    barmode='group',
    
    category_orders={
        'subreddit': subreddit_order,
        'covid_period': covid_period_order
    },
    
    title='Keyword Frequency by Subreddit, Period, and Category',
    labels={
        'subreddit': "Subreddit",
        "frequency_per_1000": "Frequency Per 1000 Words",
        'covid_period': "COVID Period"
    },
    
    color_discrete_sequence=px.colors.sequential.Darkmint_r,
    height=800,
    
    facet_row_spacing=0.15,
    facet_col_spacing=0.05
)

fig.update_xaxes(tickangle=45, matches=None, showticklabels=True)
fig.update_yaxes(matches=None, showticklabels=True)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.update_layout(margin=dict(b=100))

fig.show()

Explanation

Creates a complex faceted bar chart with:

Facets: Separate panels for each stressor category (5 panels total)

X-axis: Subreddits (Anxiety, depression, mentalhealth)

Y-axis: Frequency per 1000 words

Color: COVID period (three shades showing temporal progression)

Layout: 3 columns per row, 800px height

Updates axes to allow independent scales per facet and rotates x-axis labels 45 degrees for readability

Detailed Interpretation of Faceted Charts

Panel 1: Health Anxiety

r/depression: - Pre-COVID: ~2.0 - During COVID: ~2.3 - Post-COVID: ~2.2

r/mentalhealth: - Pre-COVID: ~4.8 - During COVID: ~3.5 - Post-COVID: ~3.6

r/Anxiety: - Pre-COVID: ~7.0 - During COVID: ~9.2 - Post-COVID: ~10.0

Interpretation: r/Anxiety dominates health anxiety language, which makes sense given the subreddit’s focus. Strikingly, r/Anxiety’s health anxiety mentions increased 43% from pre-COVID to post-COVID (7.0 → 10.0 per 1000 words), showing sustained physical symptom focus even after the pandemic. r/mentalhealth actually decreased during COVID, possibly because general wellness discussions were crowded out by more acute concerns. r/depression remained relatively stable, suggesting depression discussions focus less on physical anxiety symptoms.


Panel 2: Work Stress

r/depression: - All periods: ~3.2-3.7

r/mentalhealth: - Pre-COVID: ~3.0 - During COVID: ~3.5 - Post-COVID: ~3.2

r/Anxiety: - Pre-COVID: ~3.0 - During COVID: ~5.8 - Post-COVID: ~4.2

Interpretation: r/Anxiety showed dramatic work stress increase during COVID (nearly doubling from 3.0 to 5.8), reflecting how work-from-home, job insecurity, and workplace changes particularly triggered anxiety. The post-COVID decline to 4.2 suggests partial adaptation but still 40% above baseline. r/depression and r/mentalhealth remained relatively stable, indicating work stress isn’t as central to depression/general mental health discussions.


Panel 3: School Stress

r/depression: - All periods: ~3.8-4.0

r/mentalhealth: - Pre-COVID: ~3.5 - During COVID: ~3.4 - Post-COVID: ~3.3

r/Anxiety: - Pre-COVID: ~3.4 - During COVID: ~3.6 - Post-COVID: ~3.4

Interpretation: School stress showed remarkable stability across all subreddits and periods, hovering around 3.3-4.0 per 1000 words. This suggests academic stress is a constant background factor in these communities, relatively unaffected by COVID. The slight elevation in r/depression might reflect the higher prevalence of depression among students dealing with academic pressures.


Panel 4: Burnout

r/Anxiety: - Pre-COVID: ~2.2 - During COVID: ~2.2 - Post-COVID: ~2.2

r/depression: - Pre-COVID: ~4.7 - During COVID: ~5.0 - Post-COVID: ~7.6

r/mentalhealth: - Pre-COVID: ~2.9 - During COVID: ~3.0 - Post-COVID: ~3.0

Interpretation: This is the most striking panel. r/depression’s burnout language exploded post-COVID, increasing 62% from pre-COVID (4.7 → 7.6 per 1000 words). During COVID it was only slightly elevated (5.0), but post-COVID saw massive increase in words like “tired,” “anymore,” “hate,” and “end.” This suggests depression communities are experiencing severe exhaustion and possibly increased suicidal ideation in the aftermath of COVID.

Paradoxically, r/Anxiety’s burnout remained completely flat (~2.2 across all periods), suggesting anxiety manifests more as acute distress rather than chronic exhaustion. r/mentalhealth also remained stable, possibly because it’s a more solutions-focused community.


Panel 5: Therapy

r/Anxiety: - Pre-COVID: ~0.95 - During COVID: ~0.7 - Post-COVID: ~1.0

r/depression: - Pre-COVID: ~0.7 - During COVID: ~0.9 - Post-COVID: ~0.9

r/mentalhealth: - Pre-COVID: ~1.2 - During COVID: ~1.4 - Post-COVID: ~1.35

Interpretation: r/mentalhealth consistently discusses therapy most (1.2-1.4 per 1000 words), reinforcing its role as a resource-oriented community. The slight increase during COVID likely reflects telehealth discussions.

r/Anxiety shows a U-shaped pattern: therapy mentions dropped during COVID (0.95 → 0.7), possibly because acute crisis posts crowded out treatment discussions, then rebounded post-COVID to baseline levels.

r/depression showed modest increases, suggesting growing treatment-seeking behavior. Overall, therapy language remains relatively low across all communities (under 1.5 per 1000 words), which might indicate barriers to treatment access or stigma.


Cross-Cutting Insights from Subreddit Comparison

1. Subreddit Specialization is Real:

  • r/Anxiety: Highest in health anxiety (10.0) and work stress (5.8 during COVID)
  • r/depression: Highest in burnout (7.6 post-COVID) and school stress (4.0)
  • r/mentalhealth: Highest in therapy discussions (1.4) and most positive sentiment

Each community has distinct concerns, validating their separate existence and suggesting tailored interventions would be more effective than one-size-fits-all approaches.


2. Anxiety Disorders React More Acutely to COVID:

r/Anxiety showed the sharpest changes during COVID across multiple categories (work stress doubled, health anxiety spiked), while r/depression remained more stable during the acute phase but worsened dramatically afterward. This suggests:

  • Anxiety disorders: Reactive to immediate stressors
  • Depressive disorders: Delayed response, accumulated impact

3. Post-COVID Mental Health Crisis is Concentrated in Depression:

The sentiment chart shows both communities worsened post-COVID, but the burnout data reveals r/depression experienced particularly severe deterioration. The 62% increase in burnout language suggests:

  • Accumulated fatigue from COVID stressors
  • Possible economic hardship consequences
  • Loss of social support structures
  • Increased hopelessness/suicidal ideation (high “end” frequency)

4. r/mentalhealth as a Resilience Factor:

This community consistently shows:

  • Less negative sentiment than disorder-specific subreddits
  • Stable stressor levels across periods
  • Highest therapy engagement
  • Focus on solutions rather than symptoms

This suggests general mental health communities may provide more balanced support than diagnosis-specific spaces, though both serve important roles.


5. The Anxiety-to-Depression Pipeline:

The progression from acute anxiety during COVID to severe burnout/depression post-COVID across the overall dataset might represent a temporal mental health cascade:

  1. Acute COVID stress → increased anxiety (health fears, uncertainty)
  2. Prolonged stress → exhaustion, decreased coping
  3. Post-COVID reality → burnout, depression, hopelessness

This aligns with research showing chronic anxiety can lead to depression when stressors persist without resolution.


Overall Conclusions from Complete Analysis

Synthesizing all four notebooks:

  1. Volume explosion: Mental health discussions increased 18x during COVID and remain 10x elevated

  2. Sentiment deterioration: All communities experienced worsening sentiment, with continued decline post-COVID (especially r/Anxiety at -0.30)

  3. Stressor evolution:

    • Health anxiety peaked post-COVID in r/Anxiety (10.0 per 1000 words)
    • Burnout peaked post-COVID in r/depression (7.6 per 1000 words)
    • Work stress spiked during COVID in r/Anxiety then partially normalized
  4. Community-specific impacts:

    • r/Anxiety: Physical symptoms, acute reactivity, work stress
    • r/depression: Emotional exhaustion, burnout, delayed worsening
    • r/mentalhealth: Resilience, stability, treatment focus
  5. Public health implications:

    • Targeted interventions needed for anxiety vs. depression
    • Telehealth appears to have improved access (therapy mentions up)
    • Post-pandemic recovery is not occurring; crisis is intensifying
    • Suicide prevention efforts should focus on depression communities given high “end” frequency

This comprehensive analysis reveals COVID-19 triggered a profound, multifaceted, and persistent mental health crisis that varies by community but universally shows no signs of resolution. The data suggests we’re experiencing the mental health consequences now, years after the acute pandemic phase.